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The invention relates to a motion estimation imit for estimating a current 
motion vector for a group of pixels of an image, comprising: 

- generating means for generating a set of candidate motion vectors for the 
group of pixels, with the candidate motion vectors being extracted from a set of previously 
estimated motion vectors; 

- a match error xmit for calculating match errors of respective candidate motion 

vectors; and 

- a selector for selecting the current motion vector from the candidate motion 
vectors by means of comparing the match errors of the respective candidate motion vectors. 

The invention further relates to a method of estimating a current motion vector 
for a group of pixels of an image, comprising: 

- a generating step of generating a set of candidate motion vectors for the 
group of pixels, with the candidate motion vectors being extracted from a set of previously 
estimated motion vectors; 

- a match error step of calculating match errors of respective candidate motion 

vectors; and 

- a select step of selecting the current motion vector from the candidate motion 
vectors by means of comparing the match errors of the respective candidate motion vectors. 

The invention further relates to an image processing apparatus comprising: 

- receiving means for receiving a signal representing images; 

- such a motion estimation unit; and 

- a motion compensated image processing unit for calculating processed 
images on basis of the images and output of the motion estimation unit. 

The invention further relates to an encoder comprising: 

- such a motion estimation unit; 

- a discrete cosine transformer; 

- a quantizer; and 

- a run-level encoder. 
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An embodiment of the method of the kind described in the opening paragraph 
is known from the article *True-Motion Estimation with 3-D Recursive Search Block 
Matching" by G. de Haan et. al. in IEEE Transactions on circuits and systems for video 
5 technology, vol.3, no.5, October 1993, pages 368-379. 

For many applications in video signal processing, it is necessary to know the 
apparent velocity field of a sequence of images, known as the optical flow. This optical flow 
is given as a time- varying motion vector field, i.e. one motion vector field per image-pair. 
Notice that an image can be part of several image-pairs. In the cited article this motion vector 
10 field is estimated by dividing the image into blocks. For a set of candidate motion vectors of 
each block match errors are calculated and used in a minimization procedure to find the most 
appropriate motion vector fix)m the set of candidate motion vectors of the block. The match 
error corresponds to the SAD: sum of absolute luminance differences between pixels in a 
block of a current image, and the pixels of a block in a reference image shifted by the motion 
IS vector. If the reference image and the current image directly succeed each other the SAD can 
be calculated with: 

SAD(x,y,d^,d^,n):= 

^ ^ x ( \ (1) 

issO yssO 

Here (x,3;)is the position of the block, {d^^dy) is a motion vector, n is the image number, 
N and M are the width and height of the block, and Y{Xyy,n) is the value of the luminance 

20 of a pixel at position (^,3;) in image n . 

The set of candidate motion vectors comprises motion vectors which are 
extracted from a set of previously estimated motion vectors and random motion vectors. The 
set comprises motion vectors being calculated for the same motion vector field as the current 
motion vector under consideration belongs to. These motion vectors are called "spatial 

25 candidates". The set might also comprise motion vectors being calculated for another motion 
vector field. These latter motion vectors are called "temporal candidates". The choice for 
"spatial candidates" as motion vector candidates for the current block of pixels under 
consideration is based on the assumption that several blocks of pixels correspond to one and 
the same object in a scene being imaged. The choice for "temporal candidates" as motion 

30 vector candidates for the current block of pixels under consideration is based on the 

assumption that objects in a scene being imaged, move with a constant velocity. However 
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both assumptions are not always true. The result is that convergence in finding the 
appropriate motion vectors of the motion vector fields is not optimal. 

5 It is an object of the invention to provide a motion estimation unit of the kind 

described in the opening paragraph which has a relative fast convergence in finding the 
appropriate motion vectors of the motion vector fields. 

The object of the invention is achieved in that the motion estimation unit is 
arranged to add a further candidate motion vector to the set of candidate motion vectors by 

10 calculating this motion vector on basis of a first motion vector and a second motion vector, 
both belonging to the set of previously estimated motion vectors. In stead of just taking 
motion vectors which are found applicable for other portions of the image or for other 
images, now candidate motion vectors are calculated based on multiple motion vectors. 

An advantage of the proposed scheme is that it takes into account more 

1 5 information, which results in a more accurate estimation of candidate motion vectors. The 
obtained accuracy in estimation allows a new trade-off point between the number of 
candidate motion vectors and the convergence of the accuracy of the motion estimation unit. 
This is beneficial for scalable motion estimation schemes. 

Another advantage is that different motion models can be taken into account. 

20 Examples of such motion models are most recent velocity, most recent acceleration, zoom or 
rotation. The type of motion model is related with the used previously estimated motion 
vectors to calculate a candidate motion vector. The first motion vector and the second motion 
vector might belong to one and the same motion vector field. But preferably the first motion 
vector and the second motion vector belong to different motion vector fields. 

25 The set of candidate motion vectors which is tested to find the current motion 

vector might comprise: 

- "spatial candidates" extracted fi'om the set of previously estimated motion 

vectors; 

- ""temporal candidates" extracted firom the set of previously estimated motion 

30 vectors; 

- "multi-temporal candidates" calculated based on multiple "temporal 
candidates" extracted firom the set of previously estimated motion vectors; 

- "multi-spatial candidates" calculated based on multiple "spatial candidates" 
extracted firom the set of previously estimated motion vectors; and 
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- random motion vectors. 

In an embodiment of the motion estimation unit according to the invention, the 
selector is arranged to select, from the set of candidate motion vectors, a particular motion 
vector as the current motion vector, if the corresponding match error is the smallest of the 
5 match errors. This is a relatively easy approach for selecting the current motion vector from 
the set of candidate motion vectors. 

In an embodiment of the motion estimation imit according to the invention, the 
match error unit is designed to calculate a first one of the match errors by means of 
subtracting luminance values of pixels of blocks of pixels of respective images of a first 

10 image pair. In this case the group of pixels corresponds with a block of pixels. Preferably the 
sum of absolute luminance differences (SAD) is calculated. The SAD is a relatively reliable 
measure for correlation which can be calculated relatively fast. 

In an embodiment of the motion estimation unit according to the invention, the 
first motion vector belongs to a first forward motion vector field and the second motion 

IS vector belongs to a second forward motion vector field, with the first forward motion vector 
field and the second forward motion vector field being different. A forward motion vector 
comprises motion vectors which are calculated by comparing a block of pixels of a current 
image with blocks of pixels of a reference image which is succeeding the current image. 
Notice that succeeding does not mean that there are no other images in between the current 

20 and the reference image. Suppose there is a series of images comprising, image 0, image 1, 
image 2 and image 3, respectively. Then the following forward motion vectors could be 
estimated with image 0 as current image: K(0,1) , i.e. with image 1 being the reference image, 
K(0,2) , i.e. with image 2 being the reference image and K(0,3) , i.e. with image 3 being the 
reference image. Though the general proposed scheme allows any kind of computation on the 

25 motion vector fields, the focus is on simple to implement, low-cost, element-wise operations, 
i.e., the further candidate motion vector is based on two previously calculated motion vectors. 
Examples are: 

- to calculate the further candidate motion vector by means of subtraction of 
the first motion vector from the second motion vector; 

30 - to calculate the further candidate motion vector by means of subtraction of 

the second motion vector from the first motion vector; and 

- to calculate the further candidate motion vector by means of multiplication of 
the second motion vector with a predetermined constant and subtraction of the first motion 
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vector. Multiplication of a motion vector with a predetermined constant can be implemented 
by means of summation. 

In an embodiment of the motion estimation unit according to the invention, the 
first motion vector belongs to a fourth forward motion vector field and the second motion 
S vector belongs to a backward motion vector field. A backward motion vector comprises 
motion vectors which are calculated by comparing a block of pixels of a current image with 
blocks of pixels of a reference image which is preceding the current image. Notice that 
preceding does not mean that there are no other images in between the current and the 
reference image. Suppose there is a series of images comprising, image 0, image 1, image 2 

10 and image 3, respectively. Then the following backward motion vectors could be estimated 
with image 3 as ciurent image: F(3,2) , i.e. with image 2 being the reference image, F(3,l) , 
i.e. with image 1 being the reference image and F(3,0) , i.e. with image 0 being the reference 
image. The fiirther candidate motion vector might be based on two previously calculated 
motion vectors. An example is to calculate the further candidate motion vector by means of 

15 multiplication of the first motion vector with a predetermined constant and summation of the 
second motion vector. An advantage of combining motion vectors from forward motion 
vector fields and backward motion vector fields is that motion vectors corresponding to 
images with a relatively small time difference with the current image can be applied. 

It is advantageous to apply an embodiment of the motion estimation unit 

20 according to the invention in a video encoder, e.g. MPEG encoder. Especially in MPEG 

encoders it is common to calculate multiple motion vector fields for an image. These motion 
vectors are temporarily stored. Applying some of these multiple motion vector fields to 
calculate candidate motion vectors is advantageous. In MPEG encoders it is known to 
calculate candidate motion vectors by means of scaling a single previously estimated motion 

25 vector. In some cases, the calculation of multi-temporal estimates is of lower computational 
complexity than scaling motion vectors. Whereas scaling requires multiplication with 
complicated factors (not easily decomposed in simple binary shift and add operations), the 
multi-temporal candidate motion vector can be computed with simple shift and add 
operations. Modifications of the encoder and variations thereof may correspond to 

30 modifications and variations thereof of the motion estimation unit described. 

A multi-temporal candidate can be calculated based on two or more previously 
estimated motion vectors. The type of calculation for the multi-temporal candidate depends 
on which of the previously estimated motion vectors are available. The type of calculation 
can be controlled by the time differences between the current image and the available 
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previously estimated motion vectors. Another parameter which can be of influence for the 
selection of a previously estimated motion vector is the match error of the previously 
estimated motion vector. Knowledge of the apparent motion model is also relevant. 

It is advantageous to apply an embodiment of the motion estimation unit 
S according to the invention in an image processing apparatus as described in the opening 
paragraph. The image processing apparatus may comprise additional components, e.g. a 
display device for displaying the processed images or storage means for storage of the 
processed images. The motion compensated image processing unit might support one or 
more of the following types of image processing: 
10 - De-interlacing: Interlacing is the common video broadcast procedure for 

transmitting the odd or even numbered image lines alternately. De-interlacing attempts to 
restore the full vertical resolution, i.e. make odd and even lines available simultaneously for 
each image; 

- Up-conversion: From a series of original input images a larger series of 

15 output images is calculated. Output images are temporally located between two original input 
images; and 

- Temporal noise reduction. This can also involve spatial processing, resulting 
in spatial-temporal noise reduction. 

Modifications of the image processing apparatus and variations thereof may correspond to 
20 modifications and variations thereof of the motion estimation imit described. 

These and other aspects of the motion estimation imit, of the encoder, of the 
method and of the image processing apparatus according to the invention will become 
25 apparent fi-om and will be elucidated with respect to the implementations and embodiments 
described hereinafter and with reference to the accompanying drawings, wherein: 

Fig. 1 schematically shows the relations between a number of consecutive 
images and motion vectors; 

Sg- 2A», 2B, 2C and 2D schematically show examples of relations between 
30 motion vectors belonging to a moving object, in order to illustrate that a multi-temporal 
candidate motion vector can be calculated by means of two previously estimated motion 
vectors; 

Fig. 3 schematically shows the relations between motion vectors and a niunber 
of consecutive pictures as known in MPEG encoding; 
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Fig. 4 schematically shows a portion of a motion vector field; 
t — ^ 

Fig^ schematically shows an embodiment of a motion estimation unit, 
according to the invention; 

Fig^^jchematically shows an embodiment of a video encoder, comprising a 
5 motion estimation vmit, according to the invention; and 

Fig. 7 schematically shows elements of an image processing apparatus, 
comprising a motion estimation unit, according to the invention. 
Corresponding reference numerals have the same meaning in all of the Figs. 

10 

Fig. 1 schematically shows the relations between a number of consecutive 
images 0,1,2,3,4,5 and motion vectors K(c,r) with c e {0,3} and r € {0,1,2,3,4,5} . The syntax 
is as follows. For example, a forward motion vector related to an image pair comprising 
image 0 and image 1 is denoted as F(0,1) . A baclcward motion vector related to an image 
15 pair comprising image 2 and image 3 is denoted as F(3,2) . In principle, other values of cand 
r are possible. 

Fig. 2 A schematically shows an example of a relation between motion vectors 
belonging to a moving object 200. It is to illustrate that a multi-temporal candidate motion 
vector F(3,4) can be calculated by means of two previously estimated motion vectors. 
20 Assume that the following motion vectors have already been estimated: ^(0,2) and F(0,3) . 

Now a multi-temporal candidate motion vector K(3,4) has to be calculated. This can be 
achieved by applying Equation 2: 

F(3,4) = F(0,3)-^^(0,2) (2) 

This means that K(3,4) is an extrapolated motion vector which is calculated by means of 
25 subtraction of two preceding forward motion vectors. 

Fig. 2B schematically shows another example of a relation between motion 
vectors belonging to a moving object 200. Assume that the following motion vectors have 
already been estimated: F(3,4) and K(3,2) . Now a multi-temporal candidate motion vector 

P^(3,5) has to be calculated. This can be achieved by applying Equation 3: 

30 r(3,5) = 3F(3,4) + f^(3.2) (3) 
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This means that F(3,5) is an extrapolated motion vector which is calculated by means of 
summation of a forward motion vector multiplied by a predetermined constant and a 
baclcward motion vector. 

Fig. 2C schematically shows another example of a relation between motion 
5 vectors belonging to a moving object 200. Assume that the following motion vectors have 
already been estimated: ^(0,2) and F(0,1) . Now a multi-temporal candidate motion vector 

F(0,3) has to be calculated. This can be achieved by applying Equation 4: 

F(0,3) = 2F(0,2)-^(0,1) (4) 
This means that F(0,3) is an extrapolated motion vector which is calculated by means of 
10 subtraction of a forward motion vector from another forward motion vector which has been 
multiplied by a predetermined constant. 

Fig. 2D schematically shows another example of a relation between motion 
vectors belonging to a moving object 200. Assume that the following motion vectors have 
already been estimated: F(0,2) and F(0,3) - Now a multi-temporal candidate motion vector 

15 V(3^2) has to be calculated. This can be achieved by applying Equation 5: 

F(3,2) = F(0,2)-K(0,3) (5) 
This means that F(3,2) is an interpolated motion vector which is calculated by means of 
subtraction of a forward motion vector from another forward motion vector. 

Fig. 3 schematically shows the relations between motion vectors and a number 
20 of consecutive pictures IBBPBB as known in MPEG encoding. In MPEG, there are I, P, and 
B picture types. Both I and P pictiu-es serve as reference images. The P pictures are forward 
predicted from the previous reference image. The B pictures are bi-directionally predicted 
from a previous and fiiture reference image. A group of pictures (GOP) comprises subgroups 
k of the fomi (I/P)BB. . .B(I/P). Notice that in Fig. 3 it is assumed that k = 2. The number of 
25 pictures within a subgroup 1 or subgroup 2, analogous to the prediction depth A/ of a GOP, 
is denoted by M, and M2 . In general, Mj^ is not necessarily fixed. An altemative syntax is 
used for the motion vectors. A forward motion vector, which is used in the prediction of the 
picture of the subgroup, is denoted by . The backward motion vector is denoted by 

bf. 
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Next, generalizations of the examples described in connection with Fig. 2A, 
Fig. 2B, Fig. 2C and Fig. 2D will be provided. The alternative syntax is used for that. It is 
assumed that Af ^ = M = 3 . Equation (2) can be generalized to: 

5 The xmderlying motion model is "most recent velocity". In this case, motion vectors 

belonging to another subgroup are used to calculate the multi-temporal candidate motion 
vector. Notice that taking into account the assimiptions which are applicable for Fig, 3 yields: 

7.'=/3-/2 (7) 

This corresponds with the example provided in Fig. 2A. 
10 Equation (3) can be generalized to: 

=-if' +b'-l^-x (8) 
The underlying motion model is "most recent acceleration". In this case a motion vector 
belonging to another subgroup is used together with a motion vector from the same subgroup 
to calculate the multi-temporal candidate motion vector. Notice that taking into accoimt the 
15 assumptions which are applicable for Fig. 3 yields: 

fi=^f'^b\ (9) 
This corresponds with the example provided in Fig. 2B, 
Equation (4) can be generalized to: 

Ji'^fL-fU (10) 

20 The underlying motion model is "most recent velocity*'. In this case motion vectors 

belonging to the same subgroup are used to calculate the multi-temporal candidate motion 
vector. Assume that / = 3 . Taking into account the assumptions which are applicable for Fig. 
3 yields: 

7/ =2//-/,^ (11) 

25 This corresponds with the example provided in Fig. 2C. 

Equation (S) can be generalized to: 

b^=fi-fM, (12) 
In this case motion vectors belonging to the same subgroup are used to calculate the multi- 
temporal candidate motion vector. Assume that i = 2 . Taking into account the assumptions 
30 which are applicable for Fig. 3 yields: 

bi=n-n (13) 
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This corresponds with the example provided in Fig. 2D. 

Fig. 4 schematically shows a portion of a motion vector field 400 comprising 
the motion vectors 402-410. The motion vector field 400 is related to a zoom. Although the 
various motion vectors 402-410 are different they contain shared information, i.e. the 
S parameters of the motion model. Extracting these parameters firom previously calculated 
motion vectors is the first step. The second step is to apply this information for the 
calculation of a candidate motion vector. That means a multi-spatial candidate motion vector. 
The process resulting in this candidate motion vector, can be based on interpolation and/or 
extrapolation schemes which correspond with those described above, 
10 Fig. 5 schematically shows an embodiment of a motion estimation unit 500, 

comprising: 

- a generating means 502 for generating a set of candidate motion vectors for a 
block of pixels of a current image; 

- a match error unit 506 for calculating match errors of respective candidate 
1 S motion vectors of the block of pixels by summation of absolute differences between pixel 

values of the block of pixels and pixel values of a reference image; 

- a storage 504 for storing estimated motion vectors and the corresponding 
match errors; and 

- a selector 508 for selecting a current motion vector firom the candidate 

20 motion vectors by means of comparing the match errors of the respective candidate motion 
vectors. 

The input of the motion estimation unit 500 comprises images and is provided at an input 
connector 510. The output of the motion estimation unit 500 are motion vector fields and is 
provided at an output coimector 512. The behavior of the motion estimation unit 500 is as 

25 follows. First the generating means 502 generates for a block of pixels a set of candidate 
motion vectors. This set might comprise random motion vectors or motion vectors directly 
extracted fi*om the set of previously estimated motion vectors as stored in the storage 504. 
But the generating means 502 is also arranged to calculate a fiirther candidate motion vector 
on basis of a first motion vector and a second motion vector, both belonging to the set of 

30 previously estimated motion vectors. Such a calculation is conform with the Equations as 
described in connection with any of the Figs. 2A-2D or 3 or conform with the concept as 
described in connection with Fig. 4. After the set of candidate motion vectors is made, the 
match error unit 506 calculates for these candidate motion vectors the match errors. Then the 
selector 508 selects a current motion vector firom the set of candidate motion vectors on the 
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basis of these match errors. This current motion vector is selected because its match error has 
the lowest value. The current motion vector is also stored in the storage 504, 

Fig. 6 schematically shows an embodiment of a video encoder 600 that is 
designed to transform an incoming sequence of uncompressed pictures into compressed 
S pictures. The video encoder 600 comprises: 

- an encoder chain 602 having a begin and an end, and with successively: a 
motion estimator 500, a discrete cosine transformer 626, a quantizer 628, and a run-level 
encoder 629; 

- a decoder chain 616 having a begin and an end, and with successively: a run- 
10 level decoder 623, an inverse quantizer 622, an inverse discrete cosine transformer 620, and a 

motion compensator 618; 

- a variable length encoder 634; and 

- a reference picture pool 603 to store previous reference pictures 630 and 
future reference pictures 632. 

15 The incoming sequence of uncompressed pictures enters the video encoder 600 at its input 
connector 612. The coding of pictures is described on a MacroBlock basis, i.e. blocks of 
16x16 pixels. Within each picture, MacroBlocks are coded in a sequence from left to right. 
For a given MacroBlock, the coding mode is chosen. This depends on the picture type and 
the effectiveness of motion compensated prediction. Depending on the coding mode, a 

20 motion compensated prediction of the contents of the MacroBlock based on past and/or 
future reference pictures is formed by the motion estimation unit 500. These reference 
pictures are retrieved from the reference picture pool 603. The prediction is subtracted from 
the actual data in the current MacroBlock, i.e. pixels in the uncompressed picture, to form a 
prediction error. Note that a prediction error is a matrix of pixels. The prediction error is 

25 input for the discrete cosine transformer 626, which divides the prediction error into 8x8 
blocks of pixels and performs a discrete cosine transformation on each 8x8 block of pixels. 
The resulting two-dimensional 8x8 block of DCT coefficients is input for the quantizer 628 
which performs a quantization. Quantization mainly affects the high frequencies. The human 
visual system is less sensitive for picture distortions at higher frequencies. The quantized 

30 two-dimensional 8x8 block of DCT coefficients is scanned in zigzag order and converted by 
the run-level encoder 629 into a one-dimensional string of quantized DCT coefficients. This 
string represents a compressed picture. Such a compressed picture can be stored in the 
reference picture pool 603 for later usage, e.g. to serve as reference picture. A compressed 
picture can also be converted into a variable length encoded string. This conversion is 
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performed by the variable length encoder 634. Besides the prediction error other information, 
e.g. the type of the picture and motion vector field is coded in a similar way. 

Motion estimation requires reference pictures. Both previous reference 
pictures and future reference pictures are reconstructed from compressed pictures by means 
5 of the decoder chain 616. Compressed pictures are retrieved from the reference picture pool 
603 when needed. They are successively processed by the a run-level decoder 623, the 
inverse quantizer 622, the inverse discrete cosine transformer 620 and the motion 
compensator 618. These four units perform the inverse operations related to the four units of 
the encoder chain 602, but in reverse order. After reconstruction the reference pictures are 
1 0 temporarily stored in the reference picture pool 603 to be used for motion estimation for a 
subsequent uncompressed picture. 

Fig. 7 schematically shows elements of an image processing apparatus 700 

comprising: 

- receiving means 702 for receiving a signal representing images to be 

1 S displayed after some processing has been performed. The signal may be a broadcast signal 
received via an antenna or cable but may also be a signal from a storage device like a VCR 
(Video Cassette Recorder) or Digital Versatile Disk (DVD). The signal is provided at the 
input connector 706. 

- a motion estimation unit 500 as described in connection with Fig. 5; 
20 - a motion compensated image processing imit 704; and 

- a display device 706 for displaying the processed images. This display device 

is optional. 

The motion compensated image processing unit 706 requires images and motion vectors as 
its input. 

25 It should be noted that the above-mentioned embodiments illustrate rather than 

limit the invention and that those skilled in the art will be able to design altemative 
embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be constructed as limiting the claim. 
The word 'comprising' does not exclude the presence of elements or steps not listed in a 

30 claim. The word "a" or "an" preceding an element does not exclude the presence of a 
plurality of such elements. The invention can be implemented by means of hardware 
comprising several distinct elements and by means of a suitable programmed computer. In 
the unit claims enumerating several means, several of these means can be embodied by one 
and the same item of hardware. 



